5.1 Introduction to Asymptotics

1 Introduction

In the above example, we want to test H0:β1=0. So we need to condition on X1Ty (by discuss in testing with nuisance parameters). But that would condition on Y.
If we want to estimate β, UMVU generically doesn't exist; Bayes needs prior on βRd.
Software packages use general purpose asymptotic methods: β^MLE(x,y)=argmaxβRdpβ(y|x)=argmaxβRdβTXTyA(β;xi).
Asymptotically, β^MLEN(β,J(β)1) (recall Fisher information).

β is derived because β^MLE is unbiased, and we have 2l(β^;X,y)Eβ[2l(β;X,y)]=J(β) and Σ^=(2l(β^)1)Σ(β)=J(β)1.

So Zj=β^jβjσ^jN(0,1). And for test H0:βj=0, reject if β^jσ^j is large/small/extreme.
We can invert [1]: |Zj|<zα/2βjβ^j±zα/2σ^j.


So far, everything has finite-sample, often using special properties of model P (like exponential family) to do exact calculations.
For "general" models, exact calculations may be intractable or impossible. But we may be able to approximate our problem with a simpler problem in which calculations are easy.
Typically approximate by Gaussian, by taking limit of number of observations. But this is only interesting if approximation is good for "reasonable" sample size.

2 Probability Recall

2.1 Convergence

Let X1,X2,Rd be a sequence of random vectors. We care about two kinds of convergence:

Theorem (Weak Convergence)

X1,X2,R, Fn(x)=P(Xnx),F(x)=P(Xx), then XnX iff Fn(x)F(x),x (F continuous at x.)

If Xnδ1n, Xδ0, then XnX.
Proposition

XnpcXnδc.

In a sequence of statistical models Pn={Pn,θ:θΘ} with XnPn,θ, we say δn(Xn) is consistent for g(θ) if δn(Xn)Pθg(θ), meaning Pθ(||δn(Xn)g(θ)||>ε)0. Usually we omit the index n, because sequence can be implicit.

2.2 Limit Theorems

Denote Xn=1ni=1nXi. We are familiar with

3 Continuous Mapping, Delta Method

Theorem (Continuous Mapping)

g is continuous, X1,X2, is a sequence of RVs.

  • If XnX, then g(Xn)g(X).
  • If Xnpc, then g(Xn)pg(c).
Theorem (Slutsky)

Assume XnX,Ynpc. Then Xn+YnX+c, XnYncX. XnYnXc if c0.

Theorem (Delta Method)

If n(Xnμ)N(0,σ2), and f(x) is differentiable at x=μ. Then n(f(Xn)f(μ))N(0,f˙(μ)2σ2).

Informal statement:
XnN(μ,σ2n), then f(Xn)N(f(μ),f˙(μ)σ2n).

In general, we can do higher-order Taylor expansions for delta method if derivatives is 0.

f(Xn)f(μ)+f(μ)(Xnμ)+f(μ)2(Xnμ)2+

If f(μ)=0, use second-order term n(f(Xn)f(μ))f(μ)2(n(Xnμ))2f(μ)σ22χ12.


  1. Recall this section. ↩︎